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Abstract 


This  paper  studies  the  estimation  of  models  in  which  the  set  of 
instruments  is  not,  in  fact,  orthogonal  to  the  residuals.  1  first  show  that, 
in  overidentified  models  of  this  type,  one  can  generally  obtain  arbitrary 
estimates  by  varying  the  weights  given  to  different  instruments.  I  then 
weaken  the  assumptions  of  instrumental  variable  estimation  by  allowing  for 
nondegenerate  price  distributions  over  the  product  of  instruments  and 
residuals.  If  the  variance  covariance  matrix  of  this  distribution  is 
diagonal,  the  estimates  which  minimize  the  impact  of  misspecif ication  are 
shown  to  lie  inside  the  polyhedron  of  estimates  from  the  exactly  identified 
submodels. 


Introduction 

Consider  the  single  equation  model: 

Y  =  XI3  +  e  (1) 

where  Y  is  a  T  x  1  vector,  X  a  T  x  k  matrix,  (3  a  k  x  1  vector  of  parameters  of 
Interest  and  e  at  T  x  1  vector  of  disturbances.  Often  economic  reasoning 
predicts  that  e   is  uncorrelated  with  a  series  of  variables  Z.   (which 
may  include  X's).   It  is  then  natural  to  estimate  the  vector  B  by  the  method 
of  instrumental  variables  proposed  by  Reiersol  (1945),  discussed  in  detail  in 
Sargan  (1958)  and  generalized  by  Hansen  (1982),  This  method  considers  the 
sample  inner  products  of  the  instruments  and  residuals  Z.(Y-XB)  where 
Z.  is  the  vector  of  observations  on  instrument  i.  It  then  sets  k  linear 
combinations  of  these  products  equal  to  zero  so  that 

WZ'(Y-XI3)  =  0  (2) 

where  Z  is  a  T  x  m  matrix  of  instruments,  m  >.  k  and  W  is  a  k  x  m  weighting 
matrix  of  rank  k. 

The  hypotheses  that  the  expected  value  of  Z  e  is  exactly  zero  is 
probably  false  for  most  economic  models.  This  explains  in  part  why,  in 
empirical  papers  this  hypothesis  is  often  rejected  by  Hausman  (1978)  tests  and 
other  specifications  tests.   In  particular,  such  rejections  are  reported  by: 
Hansen  and  Singleton  (1983)  Mankiw,  Rotemberg  and  Summers  (1982),  Pindyck  and 
Rotemberg  (1983).   After  all,  the  models  are  only  an  approximation  to 
reality.  The  lack  of  concern  expressed  over  these  rejections  must  mean  that 
the  authors  imagine  on  £  priori  grounds  that  the  inconsistency  of  the 
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resulting  estimates  must  be  small.   This  belief  may  be  based  on  Fischer's 
(1961)  "proximity  theorem,"  which  states  that  for  a  fixed  W  as  the  mean  of 
e  Z.  goes  to  zero  the  inconsistency  of  S  disappears  in  a  continuous 
fashion.  This  paper  argues  that  this  optimism  may  be  unfounded.  I  show  that 
when  overidentified  models  (i.e.  models  where  m  >  k)  are  misspecified  even 
slightly,  the  estimated  B's  may  be  extremely  far  from  the  true  S's.  This 
result  does  not  contradict  Fischer's  result  directly.  This  is  so  because  I 
keep  the  mean  £,.2,   fixed  and  I  consider  changes  in  the  weighting  matrix 
W.  If  the  means  of  e^Z,  differ  sufficiently  across  instruments,  one 

can  obtain  essentially  arbitrary  (3's  by  varying  W. 

Methods  have  been  proposed  for  selecting  weighting  matrices  that  minimize 

the  asymptotic  covariance  matrix  of  the  B's  under  the  assumption  that  the 

model  is  correctly  specified.  In  particular  if  the  e  's  are  i.l.d. 

then  the  "optimal"  W  is  X'Z(Z'Z)   and  the  resulting  estimator  is  obtained 

by  two  stage  least  squares.  Here  I  propose  a  different  estimation  procedure. 

This  procedure  is  designed  to  minimize  the  impact  of  misspecif ication.  I 

assume  that  Z'.e/T  converges  to  V.  as  T  goes  to  infinity.  However, 

instead  of  assuming  V  is  zero,  I  treat  V.  as  an  unknown  random  variable 

from  the  point  of  view  of  the  econometrician.  I  assume  that  V.  has  mean  zero 

and  variance  a,    (so  that,  on  average  the  estimates  are  consistent).  Also 

the  expected  value  of  V  V.  is  zero  so  that  the  asymptotic  biases 

from  the  different  instruments  are  uncorrelated.  Under  these  circumstances 

I  discuss  the  instrumental  variables  estimator  which  minimizes  the  asymptotic 

covariance  matrix  of  (3.   I  show  that  this  optimal  B  is  strictly  inside  the 

m 
polyhedron  whose  vertices  are  obtained  from  estimating  the  n  exactly 
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identified  submodels.  I  also  show  that  the  estimates  obtained  from  two  stage 
least  squares  are  not  necessarily  inside  this  polyhedron.  The  paper  proceeds 

as  follows.  Section  II  shows  the  arbitrariness  of  B  when  the  model  is 
misspeclf led,  while  in  Section  III  my  solution  to  this  arbitariness  is  based 
on  priors  over  V  .  Section  IV  concludes. 

I.  The  Arbitrariness  of  the  Estimated  Parameters 


Let  13  be  the  value  of  S  which  satisfies  (2).  Then: 
(3  =   (W  Z  X)""""  W  Z  Y 


(3) 


Let  (3 


jl,j2  ...  jk 


for  j  <  1  <  j2  ...  <  jk  <  m  be  the  estimate  of 


(3  obtained  from  using  the  instruments  Z..  ...  Z .,  .  This  estimate  is  given 
by 


ijl  •••  jk 


;ji 

'j2 


Jk 


^1  •••  \ 


-1   r 


V 


Z*  Y 


(4) 


where  X.  is  the  jth  column  of  X, 
Proposition  1 

y. 


"ji 


jk 


,jl  ...  jk 


1<  j^  <  ...<  jk<m 


where  the  a's  sum  to  one. 


(5) 


Proof 


.th 


Consider  the  i   element  of  (3.  It  is  given  by  A  /B.  A.  is  the  inner 

th  ' 

product  of  the  vector  whose  elements  are  the  cofactors  of  the  i   column  of  (W  Z  X) 

with  W  Z  Y  while  B  is  the  determinant  of  WZ  X.  Thus  A.  is  the  determinant 

th  ' 

of  a  matrix  formed  by  deleting  the  i   column  of  (W  Z  X)  and  replacing  it  by 
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t 
W  Z  Y.  Using  the  Cauchy-Binet  identity  as  in  Gantmacher  (1959)  the  determinant 

of  W  Z  X  can  be  written  as 


IWZ'Xl    =  V 

i<jl<j2   ...      jk.<m 


w        w 
l.jl  ...   l.jk 


\,jl  •••  \,jk 


•         1 


f  t 

jk  1 


jk\ 


(6) 


where  W  .  is  the  typical  element  of  W.  That  is,  the  determinant  can  be  written 
as  the  sum  of  the  products  of  determinants  obtained  from  selecting  k  columns  of  W 
and  the  corresponding  k  rows  of  Z'X.  Similarly  k^   is  given  by: 


1  jl  ...  jk  m 


Hence,  using  (11) 


^l,jl  •••  ^l.jk 


\,jl  •••  \.jk 


Ia         A-.   •  •  •  Lt         A  .   - 

jl       jl 


Z  ,,  X-  .  .  .  Z   X.  - 
jk  1     jk  i-1 


Z'  Y  ...   Z*  X 

jl         jl  ^ 


tt  .-I   I     •  •  •       ^  ai   A. 

jk       jk  k 


13.  = 

1 


l_<^jl  ...   jk<m 


ojl  ...  jk 

^jl  ...  jk   ^i 


where 


«ji  •••  jk 


"l.jl  •••  ^.jk 


W. 


k,jl 


W. 


k,jk 


^jl^l  •••  ^jl\ 


jk  1       jkTc 


l<jl<j2  rrt   jk<m 


l.jl       l.jk 


\,jl  •••  "k.jk 


(7) 


^jl^l  ...  ^jl  \ 


jk  1      jk  k 


Q.E.D, 


If  the  model  is  correctly  specified,  Hansen  (1982)  shows  that  any  W  of  full 
rank  leads  to  consistent  estimates.  Thus  even  when  the  matrices  of  second  cross 


moments  between  Z,....  Z .,  and  X  are  positive  definite,  the  submatrices  of  W 
can  have  negative  determinants  leading  to  negative  a's  in  (7). 
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If  the  model  is  correctly  specified,  that  is,  if  Z.e/T  goes  to  zero  with 

il  •  •  •  ik 

probability  one,  then  every  S   ***    goes  to  13  with  probability  one.   In  this 

case  the  signs  of  the  a's  in  (7)  have  no  importance,  as  long  as  the  a's  sum  to 
one  (3  is  asymptotically  equal  to  13  with  probability  one. 

On  the  other  hand,  as  mentioned  in  the  introduction,  many  economic  models 

appear  to  be  misspecif ied.  The  Hausmann  (1978)  test  detects  misspecif ication, 

'  il  . . .  ik 

i.e.  differences  between  Z  e/T  and  zero  when  the  13   •••J   »s  are  significantly 

different  from  each  other.  Newey  (1983)  shows  that  the  test  proposed  by  Hansen 

(1982)  is  actually  equivalent  to  the  Hausman  test  under  certain  circumstances. 

So  failures  of  these  tests  mean  that  the  estimates  obtained  from  exactly 

identified  submodels  (i.e.  from  using  only  (Z      Z  )  as  instruments)  differ. 

Ji  ...  jK 

Proposition  2  establishes  that  if  these  estimates  differ  enough  asymptotically 

then  the  value  of  (3  is  arbitrary. 

il  ...  jk  .1, 

Proposition  2.  Suppose  the  (3         converge  asymptotically  to  constants. 

Moreover,  there  exist  k+1  instruments  such  that  the  matrix  whose  columns  are  the 

estimates  of  the  (k+1)  exactly  identified  submodels  which  use  these  instruments 

is  of  full  rank.  Then,  for  every  kxl  vector  y,   one  can  construct  a  matrix  W 

which  makes  6  equal  to  y 

Proof 

First  let  W  have  nonzero  elements  only  in  the  columns  which  correspond  to 

the  (k+1)  instruments  with  the  desired  property.  The  (k+1)  determinants 

obtained  by  selecting  k  of  the  nonzero  columns  of  W  are  clearly  arbitrary. 

For  instance,  by  multiplying  the  first  k  columns  with  this  property  by  X  and 

k-1 
the  last  column  by  1/X    one  multiplies  the  first  subdeteriminant  by 

X   and  leaves  the  others  unchanged.  Moreover,  if  k  is  even,  multiplying 

the  first  k  nonzero  columns  of  W  by  (-1)  changes  the  sign  of  all  determinants 

except  for  the  one  of  the  first  submatrix.  Finally,  multiplying  a  row  by  (-1) 

changes  the  sign  of  all  the  determinants.  So,  these  last  two  operations  allow 
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one  to  change  just  the  sign  of  the  first  determinant  even  when  k  is  even.  A 

similar  argument  allows  one  to  change  at  will  any  of  the  other  determinants. 

Since  Z'X  is  fixed,  one  can  thus  change  at  will  the  numerators  of  the  a's  in 

(7).  In  particular  one  can  pick  the  last  subdeterminant  of  W  in  such  a  way 

that  the  denominator  of  the  a's  in  (7)  is  equal  to  one.  Then  6  can  be 
rewritten  as 

B  =  B^  +  CS 

Here  (3  is  the  estimate  of  (3  obtained  using  only  the  instruments  which 
correspond  to  the  last  k  nonzero  columns  of  W.  C  is  a  matrix  whose  columns 
are  given  by  the  difference  between  the  k  (3's  obtained  from  the  other  exactly 
identified  submodels  and  6  .   Finally  S  is  a  kxl  vector  consisting  of  the  k 

arbitary  ct's.  If  the  (k+1)  instruments  have  the  desired  property  C  is  of  rank 

-1    L 
k.  Then  one  can  obtain  13  equal  to  y  by  setting  S  equal  to  C  (y-S  ). 

Proposition  1  basically  shows  that  13  is  a  linear  combination  of  I3's  obtained 

from  the  exactly  identified  submodels.  Moreover,  since  each  one  of  these  13' s 

is  consistent,  the  sum  of  the  weights  on  these  (3's  must  be  one  to  ensure  that  13 

is  consistent.  However,  the  individual  weights  are  arbitrary  except  for  their 

need  to  sum  to  one.  So  one  weight  can  be  large  and  positive  as  long  as  another  is 

is  large  and  negative.  As  soon  as  any  two  13. 's  from  exactly  identified 

submodels  differ  one  can  thus  obtain  an  arbitary  value  for  13  by 

varying  the  weights  on  the  two  exactly  identified  (3.'s. 

This  arbitariness  of  (3  is  disturbing  for  a  number  of  reasons.  First,  a 
number  of  different  W's  have  been  proposed  for  their  "optimal"  properties  when 
the  model  is  correctly  specified.  This,  unfortunately,  gives  econometricians 
quite  a  bit  of  latitude  in  reporting  estimates  of  models  which  fail  Hausmann- 
type  tests.  In  particular,  under  the  assumption  of  conditional  homoskedascity 
the  optimal  W  is  X'Z(Z'Z)   which  gives  the  two  stage  least  squares  estimator. 
Instead  under  conditional  heteroskedasicity  Hansen  (1982)  shows  that  the 
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optimal  W  is  obtained  in  two  stages.  In  the  first  stage  any  W  of  full  rank 
can  be  used.  Then  the  residuals  from  the  first  stage  estimation  are  used  to 
construct  the  optimal  W.  Proposition  2  makes  it  clear  that  this  second  stage 
W  will,  if  the  model  is  misspecif ied,  vary  depending  on  the  first  stage  W 
that  is  chosen.  When  there  are  simultaneous  equations  being  estimated  then  W 
will  also  be  different  depending  on  whether  three-stage  least  squares  or 
iterative  three  stage  least  squares  are  selected. 

The  second  reason  the  arbitariness  of  6  is  disturbing  is  that  it  suggests 
nothing  can  be  learned  about  S  even  when  the  model  is  only  slightly 
misspecified.  This  is  intuitively  inplausible.  My  discussion  of  the 
situations  in  which  something  can  be  learned  is  relegated  to  the  next 
section.  In  the  rest  of  this  section  I  consider  whether  the  arbitariness  of 
13  disappears  when  instead  of  using  the  generalized  method  of  moments  one 
minimizes  (Y-X6)'Z  W  Z'(Y-X(3)  where  W  is  a  mxm  positive  definite  weighing 
matrix.  I  show  that  this  isnt'  so  by  focusing  on  an  example.  In  this 
example  k  is  equal  to  one  while  m  is  two. 

Thus  there  are  two  exactly  identified  submodels.  One  uses  only  Z,  as  an 
instrument  while  the  other  uses  only  Z2.  The  instrumental  variable 
estimates  from  the  two  submodels  are  given  by 


f 


,    Z,Y  (Z  g)/T 


f 


Z^X  (Z^e)/T  (8) 


The  estimates  B-*-  and  Q>     become  good  approximations  to  the  true  B  as  T 

till  «<     ~ 

becomes  large  if  Z^e/Z.x  and  Z^zlz^x.   converge  respectively  to  Z.and  Z^   which 

are  small  relative  to  B.  On  the  other  hand  consider  the  estimate  B  which 
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mlnlmlzes 


(Y-X6)  [Z^Z^] 


a  b 
b  c 


7  ' 
Z' 


(Y-XB) 


(9) 


where  W  has  been  chosen  without  loss  of  generality  to  be  symmetric.  6  is 
given  by: 


6  = 


(aX  Z  +  bX  Z2)Z  Y  +  (bX  Z  +  cX  Z  )Z  Y 
(aX  Z^+   bX  Z2)Zj^X  +  (bX  Z^+  cX  Z2)Z2X 


(j)  13^  +  (l-(t))s2 


where 


*  = 


t     1  • 

(aX  Z  +bX  Z  )Z  X 


(aX  Zj^+bX  Z2)Z-,^X  +  (bX  Z^+   cX  Z2)Z2X 


(10) 


sc  (3  is  a  weighted  sum  of  S^  and  6^  where  the  weights  add  to  one.  So, 
asymptotically. 


B  =  6  +  Z  +  (Z,  -  Z,  )(!-(})) 


(11) 


Unfortunately  ^   can  be  any  real  number  so  that  if  Zy   is  different  from  Z,  ,  6  is 
arbitrary.  This  can  be  seen  as  follows:  by  normalizing  the  Z's  one  can  make  both 

Zj^X/T  converge  to  one.  Then  cj)  is  equal  to  (a+b)/(a+2b+d) .   Let  a  equal 
one,  d  equal  (l+y)  where  y  is  bigger  than  -.5  and  b  equal  to  v  -  /1+y. 
As  long  as  v  is  small  and  positive  the  resulting  weighting  matrix  is  positive 
definite.  Then: 


* 


=     1-K)  -  /T+jI 


2+p  -2/1+jj  +2v 


(12) 


and 


lim  4)   = 

v-»-  0 
y^   0 


1-K;  -  (l-u/2) 
2+y   -  2(l+y/2)+2v 


V  -y   /2 
2v 


(13) 
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So,  for  y  positive  (^   can  be  induced  to  be  in  the  open  interval  (-00,  1/2) 
by  varying  v  .  Similarly  for  y  negative  cj)  can  be  in  the  open  interval 
(1/2,00),  On  the  other  hand  making  d  equal  a  and  choosing  b  equal  zero  <j) 
becomes  1/2. 

Ill,  The  Study  of  Misspecified  Models 

The  previous  section  showed  that  if  the  model  is  misspecified,  one  can 

choose  weighting  matrices  to  obtain  arbitrary  parameters.  This  is  true  even  if 

il  •  • ,  ik 

the  model  is  only  slightly  misspecified  in  the  sense  that  the  &•'      '*'    ■'    's 

are  close  to  G.  As  long  as  they  are  slightly  different  from  each  other,  propo- 
sition 2  holds.  However,  if  the  6   •••J  gj.g  ^^j  ^^^   economic  sense  very 
similar  to  each  other,  the  statistical  rejection  of  their  equality  should  not 
be  viewed  as  a  major  problem.  The  question  remains  however  which  W  to  use  even 
in  this  case. 

One  possibility  is  to  view  the  failure  of  specification  tests  as  a  failure 
of  a  specific  set  of  m-k  instruments  under  the  maintained  assumption  that  the 

other  k  instruments  have  the  untestable  property  that  lim  Z'e  =  0, 

T-w   ^ 

Then,  the  best  weighting  matrix  has  nonzero  elements  only  in  the  columns 
corresponding  to  the  "valid"  instruments.  While  this  procedure  may  be  appro- 
priate in  certain  contexts,  it  is  not  so  in  general.  In  macroeconomics  the 
Z  's  are  typically  lagged  values  of  various  variables.  Which  of  these 
lags  is  most  appropriate  is  generally  hard  to  decide.  In  panel  data  the  in- 
struments are  usually  individual  characteristics  like  age,  schooling  and  the 
wage  of  the  working  spouse.   It  might  be  thought  that  the  last  characteristic 
is  a  worse  instrument  in  a  labor  supply  equation  for  instance.  However,  it 
would  seem  that  even  the  first  two  characteristics  are  probably  correlated 
with  the  taste  for  working. 
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So,  in  the  usual  context  it  is  difficult  to  assert  that  one  is  sure  k  of 
the  instruments  are  Indeed  appropriate  ones.  Here,  I  propose  a  different  mode 
of  analysis  of  models  which  fail  specification  tests.  In  particular  I  propose 
that  the  polyhedron  composed  of  the  S's  from  the  exactly  identified  submodels 
be  studied.  If  this  polyhedron  is  large  in  that  the  various  (5's  have  very  diff- 
erent economic  implications,  the  misspecif ication  makes  it  hard  to  draw 
behavioral  conclusions  from  the  data.  On  the  other  hand,  if  the  polyhedron 
is  small,  the  statistical  significance  of  misspecification  doesn't  stand  in 
the  way  of  drawing  behavioral  implications.  The  focus  on  the  polyhedron  is 
motivated  by  the  fact  that  under  assumptions  strictly  weaker  than  that  the 

lim  Z'  £/T  =  0  the  estimator  which  minimizes  misspecification  is  indeed 
T-xo    ^ 

inside  this  polyhedron. 

Suppose  that  lim  Z'  e/T  is  equal  to  V.  where  V  is  a  constant.  This 

T-w  ^ 

I  ii  . . .  ik 

convergence  of  Z.e/T  is  also  required  to  make  the  (3-^   *"  -^  's  converge. 

Instrumental  variables  are  inherently  underidentif led  assymptotically  since  one 
cannot  learn  the  S's  and  the  m  V. 's.  The  usual  "identifying"  assumption  is 
that  the  V.  are  zero.  This  cannot  of  course  be  true  of  all  the  V.  if  the  model 
fails  a  specification  test.  Here  I  assume  that  econometricians  do  not  know  the 
values  of  V, .  Instead  there  are  willing  to  entertain  a  prior  distribution  over 
V  .  Since  it  is  felt  that  the  instruments  are  reasonably  close  to  being  valid, 
the  mean  of  this  prior  is  zero.  On  the  other  hand  the  prior  variance  of  V.  is 
nonzero  and  this  is  the  weakening  of  the  standard  assumption. 

This  randomness  of  V.  can  be  interpreted  as  follows.  As  long  as  the  random 
variable  Z.  e>.  is  stationary,  V.  converges  almost  surely  to  the  expected  value 
of  Z  e  conditional  on  an  invariant  set  of  J.  If,  in  addition,  the  variables 
Z.  e  are  ergodic,  the  invariant  sets  have  either  probability  zero  or  one.  In 
this  case  V  converges  almost  surely  to  the  unconditional  mean  of  Z  e  .   On 
the  other  hand,  suppose  Z.  e  is  not  erdogic.  Then  there  are  nontrivial  invari- 
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ant  subsets  of  the  set  Q,   of  underlying  states  of  the  world.  These  subsets  have 
the  property  that,  once  the  economy  starts  in  one  of  these  subsets,  it  never 
reaches  outside  the  subset.  Hence  V.  depends  explicitly  on  which  subset 
the  economy  starts  in.  Then,  even  if  the  unconditional  mean  of  Z.  e^^  is  zero, 
the  asymptotic  value  of  V.  can  be  treated  as  a  random  variable  whose  realization 
depends  on  the  actual  invariant  subset  in  which  the  economy  is  stuck.  The 
probability  of  this  realization  depends  on  the  prior  probability  of  this 
particular  invariant  subset. 

I  do  not,  however,  consider  completely  general  priors.  Instead,  I  assume 
that  the  prior  covariance  matrix  of  V. ,Z  is  diagonal.  This  assumption  has  the 
advantage  of  parsimony.  If,  before  encountering  a  rejection  with  a  specifica- 
tion test,  an  econometrician  considered  that  a  set  of  instruments  were  strictly 
valid,  it  is  hard  to  imagine  that  he/she  knows  after  the  rejection  how  the  mis- 
specification  due  to  one  instrument  depends  on  the  misspecification  caused  by 
another.  This  suggests  as  a  natural  starting  point  the  assumption  that  the  mis- 
specifications  are  uncorrelated.  If,  in  a  particular  application  economic 
theory  predicts  the  off-diagonal  terms  of  Z,  it  should  obviously  be  applied. 
On  the  other  hand  I  am  unavrare  of  theories  which  make  this  type  of  prediction. 
Such  theories  would  have  to  deal  explictly  with  the  invariant  sets  of  fi. 

The  standard  errors  in  variables  case  considered  for  instance  by  Leamer 
[1978]  has  a  residual  which  can  be  decomposed  in  two  additive  parts.  The  first 
part  (the  structural  one)  is  only  correlated  with  the  dependent  variable  Y  while 
the  second  part  (the  measurement  error)  is  correlated  only  with  X.  Then  treat- 
ing Y  and  X  as  instruments  the  convariance  of  Ye  with  X  e.  is  zero  and 
a  fortiori  in  the  iid  case,  so  is  the  covariance  between  Y'e/T  and  X'e/T. 
So  this  example  satisfies  my  diagonal  covariance  assumption.  However,  in 
general,  it  isn't  required  for  Z.  e^^  to  be  uncorrelated  with  Z'  -^e^, 
for  Z  e/T  to  be  uncorrelated  with  Z  e/T. 
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I  now  establish  that  under  these  assumptions  about  the  V. ,  the  asymptotic 
variance  covariance  matrix  of  B  gets  minimized  by  picking  an  estimate  strictly 

inside  the  polyhedron  of  estimates  from  exactly  identified  submodels. ^  This 
variance  covariance  matrix  is  given  by  the  limit  as  T  goes  to  infinity  of 

(B-8)(S-(3) ' .  When  the  model  is  correctly  specified  this  is  simply  zero  and  we 
focus  on  the  "first  order"  variance  covariance  matrix  given  by  the  expected 
value  of  T(6-B)(6-B) ' .  Here,  however,  since  the  V  are  random  from  the  point 
of  view  of  the  econometrician  S  is  a  random  variable  and  the  E(6-B)(B-6)'  is 
well  defined.  Instead,  in  the  presence  of  this  type  of  mlsspecification 
E  T(6-(3)(6-6)'  blows  up  almost  surely  as  T  goes  to  infinity. 
Proposition  3 

If  lim  Z'e/T  has  mean  zero  and  a  diagonal  variance  covariance  matrix  I, 

T-Xr> 

the  instrumental  variable  estimator  which  minimizes  the  expectation  of  (6-B)(B-B) 

is  given  by  (7)  with  all  the  a's  between  zero  and  one. 

Proof 

(from  (3)) 


6-B  = 


(W  Z'X)""*"  W  V 


where  the  typical  element  of  V,  V  is  given  by  lim  Z'e/T,  Then,  the 

^  T-K»         ^ 

asymptotic  variance  covariance  matrix  of  B  is 

E  (B-6)(S-B)'=  E  ^^^-|^^   W  V  v'  w'(^w')"^ 


which  is  clearly  minimized  for 


w  =  ^  z  -^ 


(14) 


Thus  the  matrix  composed  of  the  columns  jl  ...  jk  of  W  is  given  by: 

2 


W,  .,  ...  W,  ., 
l.jl     l.jk 


W, 


k,jl 


W, 


k,jk 


I 
T 


^Pjl  •••  ^l^jk 


k  jl      k  jk 


1/a 


jl 


0 


1/a 


jk 


Q.E.D. 
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2  2 

where  a  ..   is  the  expected  value  of  V...  Thus  the  numerator  as  well  as  each  of 

the  elements  of  the  denominator  of  (14)  are  positive.  Hence  all  a's  are 

positive  and  less  than  one. 

Note  that  the  two  stage  least  squares  estimator  become  optimal  if  E(V  V) 
is  proportional  to  (Z'Z)/T. 

If  Z  were  know  up  to  a  multiplicative  constant,  the  optimal  estimator 
of  6  would  use  the  weighing  matrix  given  by  (14)  with  the  population  moments 

replaced  by  the  sample  moments.  For  instance,  it  might  be  thought  that  once 

2 
the  Z.  are  normalized  to  have  the  same  mean,  the  a.  are  all  equal.  Then  the 

-1 
optimal  estimation  of  (3  is  simply  (X'Z  Z'X)   (X'Z  Z'Y).  If,  on  the  other  hand, 

2 
information  on  the  a^  is  unavailable,  then  it  is  better  to  analyze  only  the 

bounds  given  by  the  polyhedron  of  exactly  identified  submodels. 

It  might  be  thought  that  two  stage  least  squares  which  is  optimal  when  the 

model  is  correctly  specified  and  the  e's  are  iid  produces  estimates  which  are 

at  least  inside  this  polyhedron.  The  following  example  based  on  the  setup  of 

(8)  shows  that  this  isn't  necessarily  true. 

Suppose  that: 


lim    XJC  =  lim  Z  X  =  lim  Z  X 
T-KD  T->oo  T      T-Hx>  T 


=  1 


(15) 


Z  Z 
lim   11=4 

T-HJO 


lim  Z,Z,  =  2 


lim  Z  Z  =  2.7    (16) 
T-x» 


where  (15)  is  obtained  from  normalization.  This  example  naturally  has  a 
positive  definite  second  moment  matrix.  The  weighting  matrix  defined  in  (9) 
becomes: 


a 
b 


0.71 


2 
-2.7 


-2.7 
4 


-14- 

and  (j),  given  by  (10),  is  -1.17.  As  the  correlation  between  Z  and  Z 
goes  up  with  fixed  variances,  <j)  continues  to  fall.  The  correlation  in  this 
example  is  slightly  above  .95  which  isn't  unusual  for  macroeconomic  time 
series. 

Conclusions 

This  paper  has  shown  that,  if  one  believes  that  the  biases  introduced  by 
the  correlation  of  the  instruments  with  the  errors  are  independent,  one  should 
concentrate  on  the  polyhedron  composed  of  the  estimates  from  the  exactly 

identified  submodels.  The  "best"  estimator  of  S  is  inside  this  polyhedron. 
Moreover,  the  size  of  the  polyhedron  gives  an  idea  of  the  economic  importance 
of  the  misspecif ication.  On  the  other  hand,  if  one  is  unwilling  to  impose  any 
a  priori  structure  on  the  covariance  matrix  of  V,  it  becomes  essentially 
impossible  to  learn  about  the  B's  when  the  model  fails  a  test  of  its 
overidentifying  restrictions.  This  weakness  of  inference  must  be  contrasted 
with  the  optimistic  results  of  White  (1982).  He  shows  that  in  the  maximum 
likelihood  content  the  parameters  converge  asymptotically  to  a  unique  vector 
even  when  the  model  is  misspecif led.  Moreover,  in  the  iid  case  standard  infer- 
ence itself  remains  unperturbed  under  misspecif ication.  Similar  results  are 
presented  for  least  squares  in  White  [1980  a,b].  Maximum  likelihood  and  least 
squares  have  the  advantage  of  being  well  specified  optimization  problems  which 
they  tend  to  have  well  behaved  solutions.  On  the  other  hand,  instrument  varia- 
bles procedures  are  not  well  specified  optimization  problems  until  weighting 
matrixes  have  been  selected.  Unfortunately,  standard  weighting  matrices  like 
those  of  two  stage  least  squares  appear  to  have  desirable  properties  only  when 
the  model  is  correctly  specified. 

It  might  be  thought  that  weighted  least  squares  which  is  considered  by 
White  [1980  a,b]  is  also  not  a  well  specified  optimization  problem  in  this 
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sense.  Indeed  if  the  model  is  sufficiently  misspecif ied,  arbitrary  parameter 
values  can  probably  be  obtained  by  varying  the  weighting  matrix.  However,  at 
least  for  prediction  purposes,  White  [1980  a,b]  shows  that  weighted  least 
squares  is  always  dominated  by  unweighted  least  squares.  So  this  limitation 
of  weighted  least  squares  appears  to  be  much  less  severe  than  the  limitation 
of  intrumental  variables  discussed  here. 
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FOOTNOTES 


■^  Rejections  are  also  reported  in  Diamond  and  Hausman  (1983)  and  Dubin  and 
McFadden  (1983).  However,  these  authors'  favored  estimates  are  not  subject  to 
specification  tests. 

2  This  is  akin  to  Learner's  (1978)  observation  that  in  his  errors  in 
variables  case  the  best  estimator  of  (3  lies  between  the  estimate  obtained  by 
regressing  Y  on  X  and  the  inverse  of  the  coefficient  obtained  by  regressing  X 
on  Y. 
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